SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "db:Swepub ;pers:(Jantsch Axel);pers:(Chen Xiaowen);srt2:(2011)"

Sökning: db:Swepub > Jantsch Axel > Chen Xiaowen > (2011)

  • Resultat 1-6 av 6
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Candaele, Bernard, et al. (författare)
  • The MOSART Mapping Optimization for multi-core Architectures
  • 2011
  • Ingår i: VLSI 2010 Annual Symposium. - Dordrecht : Springer Publishing Company. ; , s. 181-195
  • Konferensbidrag (refereegranskat)abstract
    • MOSART project addresses two main challenges of prevailing architectures: (i) Theglobal interconnect and memory bottleneck due to a single, globally shared memorywith high access times and power consumption; (ii) The difficulties in programmingheterogeneous, multi-core platforms MOSART aims to overcome these through amulti-core architecture with distributed memory organization, a Network-on-Chip(NoC) communication backbone and configurable processing cores that are scaled,optimized and customized together to achieve diverse energy, performance, cost andsize requirements of different classes of applications. MOSART achieves this by:(i) Providing platform support for management of abstract data structures includingmiddleware services and a run-time data manager for NoC based communicationinfrastructure; (ii) Developing tool support for parallelizing and mapping applicationson the multi-core target platform and customizing the processing cores for theapplication.
  •  
2.
  • Chen, Xiaowen, et al. (författare)
  • Cooperative communication based barrier synchronization in on-chip mesh architectures
  • 2011
  • Ingår i: IEICE Electronics Express. - : Institute of Electronics, Information and Communications Engineers (IEICE). - 1349-2543. ; 8:22, s. 1856-1862
  • Tidskriftsartikel (refereegranskat)abstract
    • We propose cooperative communication as a means to enable efficient and scalable barrier synchronization on mesh-based many-core architectures. Our approach is different from but orthogonal to conventional algorithm-based optimizations. It relies on collaborating routers to provide efficient gather and multicast communication. In conjunction with a master-slave algorithm, it exploits the mesh regularity to achieve efficiency. The gather and multicast functions have been implemented in our router. Synthesis results suggest marginal area overhead. With synthetic and benchmark experiments, we show that our approach significantly reduces synchronization completion time and increases speedup.
  •  
3.
  • Chen, Xiaowen, et al. (författare)
  • Hybrid distributed shared memory space in multi-core processors
  • 2011
  • Ingår i: Journal of Software. - : International Academy Publishing (IAP). - 1796-217X. ; 6:12 SPEC. ISSUE, s. 2369-2378
  • Tidskriftsartikel (refereegranskat)abstract
    • On multi-core processors, memories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memory addresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtualto- Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. The hybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressing on shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-time partitioning of hybrid DSM organization in order to analyze its performance. A real DSM based multi-core platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioning demonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improvement depends on problem size, way of data partitioning and computation/communication ratio of parallel applications, network size of the system, etc. In our experiments, the maximal improvement is 34.42%, the minimal improvement 3.68%.
  •  
4.
  • Jantsch, Axel, et al. (författare)
  • Memory Architecture and Management in an NoC Platform
  • 2011. - 1
  • Ingår i: Scalable Multi-core Architectures. - New York, NY : Springer. - 9781441967770 ; , s. 3-28
  • Bokkapitel (refereegranskat)abstract
    • The memory organization and the management of the memory space is a critical part of every NoC based platform design. We propose a Data Management Engine (DME), that is a block of programmable hardware and part of every processing element. It off-loads the processing element (CPU, DSP, etc.) by managing the memory space, memory access and the communication over the on-chip network. The DME’s main functions are virtual address translation, private and shared memory management, cache coherence protocol, support for memory consistency models, synchronization and protection mechanisms for shared memory communication. The DME is fully programmable and configurable thus allowing for customized support for high level data management functions such as dynamic memory allocation and abstract data types. This chapter describes the main concepts, design and functionality of the DME and presents case studies illustrating its usage and performance.
  •  
5.
  • Naeem, Abdul, et al. (författare)
  • Realization and Performance Comparison of Sequential and Weak Memory Consistency Models in Network-on-Chip based Multi-core Systems
  • 2011
  • Ingår i: Proceedings of 16th ACM/IEEE Asia and South Pacific Design Automation Conference(ASP-DAC) 2011. - : IEEE Press. ; , s. 154-159
  • Konferensbidrag (refereegranskat)abstract
    • This paper studies realization and performance comparison of the sequential and weak consistency models in the network-on-chip (NoC) based distributed shared memory (DSM) multi-ore systems. Memory consistency constrains the order of shared memory operations for the expected behavior of the multi-core systems. Both the consistency models are realized in the NoC based multi-core systems. The performance of the two consistency models are compared for various sizes of networks using regular mesh topologies and deflection routing algorithm. The results show that the weak consistency improves the performance by 46.17% and 33.76% on average in the code and consistency latencies over the sequential consistency model, due to relaxation in the program order, as the system grows from single core to 64 cores.
  •  
6.
  • Naeem, Abdul, et al. (författare)
  • Realization and Scalability of Release and Protected Release Consistency Models in NoC based Systems
  • 2011
  • Ingår i: Proceeding of 14th Euromicro Conference on Digital System Design, 2011. - Oulu : IEEE Computer Society. - 9781457710483 ; , s. 47-54
  • Konferensbidrag (refereegranskat)abstract
    • This paper studies the realization and scalability of release and protected release consistency models in Network-on-Chip (NoC) based Distributed Shared Memory (DSM) multi-core systems. The protected release consistency (PRC) model is proposed as an extension of the release consistency (RC) model and provides further relaxation in the shared memory operations. The realization schemes of RC and PRC models use a transaction counter in each node of the NoC based multi-core (McNoC) systems. Further, we study the scalability of these RC and PRC models and evaluate their performance in the McNoC platform. A configurable NoC based platform with 2D mesh topology and deflection routing algorithm is used in the tests. We experiment both with synthetic and application workloads. The performance of the RC and PRC models are compared using sequential consistency (SC) as the baseline. The experiments show that the average code execution time for the PRC model in 8x8 network (64 cores) is reduced by 30.5% over SC, and by 6.5% over RC model. Average data execution time in the 8x8 network for the PRC model is reduced by almost 37% over SC and by 8.8% over RC. The increase in area for the PRC of RC is about 880 gates in the network interface ( 1.7% ).
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-6 av 6

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy